Detections API (v1/text/contents) integration with base interface pattern and extensible client architecture#14
Conversation
3c0323a to
548e85a
Compare
m-misiura
left a comment
There was a problem hiding this comment.
You should run pre-commit to make this code adhere to the NeMo style
| import logging | ||
| from typing import Any, Dict, List | ||
|
|
||
| from .base import BaseDetectorClient, DetectorResult |
There was a problem hiding this comment.
module imports are hanled inconsistenly, e.g. here they are relative but in actions.py, they are absolute, e.g.
from nemoguardrails.library.detector_clients.base import DetectorResult
from nemoguardrails.library.detector_clients.detections_api import DetectionsAPIClient
There was a problem hiding this comment.
Fixed - changed all imports to absolute. actions.py required absolute imports because NeMo's action discovery uses importlib.util.spec_from_file_location() which loads the module without package context, causing relative imports to fail. Updated detections_api.py to match for consistency.
| Returns: | ||
| DetectorResult with parsed detection outcome | ||
| """ | ||
| if http_status != 200: |
There was a problem hiding this comment.
can this distinguish between e.g. 404: Detector not found and 422: Validation error (invalid request)?
if http_status != 200:
return DetectorResult(
allowed=False,
score=0.0,
reason=f"HTTP {http_status} error",
label="ERROR",
detector=self.detector_name,
metadata={"http_status": http_status}
)
see Detector API spec: https://foundation-model-stack.github.io/fms-guardrails-orchestrator/docs/api/openapi_detector_api.yaml
There was a problem hiding this comment.
Fixed - added specific handling for 404 (NOT_FOUND) and 422 (VALIDATION_ERROR) per Detections API spec.
There was a problem hiding this comment.
in terms of extracting HTTP Status Mapping - what do you think about using a Dict Instead of if/elif?
| scores = [d.get("score", 0.0) for d in detections] | ||
| return max(scores) if scores else 0.0 | ||
|
|
||
| def _calculate_average_score(self, detections: List[Dict[str, Any]]) -> float: |
There was a problem hiding this comment.
I am not sure how informative it is to calculate average score across detectors; it might be best to remove _calculate_average_score
There was a problem hiding this comment.
Fixed - Removed _calculate_average_score() and all average_score references from metadata.
There was a problem hiding this comment.
I am not sure if the defined Colang flow definition is correct or if there is something wrong with the implementation
Re-running the same message gives me inconsistent outputs, e.g. sometimes
- variant 1
{"messages":[{"role":"assistant","content":"I'm sorry, but I couldn't process your request due to the following reason: Blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector. Please feel free to ask something else or try rephrasing your question."}]
- variant 2:
{"messages":[{"role":"assistant","content":"Sorry, but I'm unable to assist with that request."}]}%
- variant 3
{"messages":[{"role":"assistant","content":"This prompt is blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector"}]}%
and so on; please investigate
There was a problem hiding this comment.
Fixed - changed from bot refuse with message $variable to predefined bot response pattern. The issue was that variable interpolation in bot messages triggered LLM generation with inconsistent fallback paths. Now using define bot blocked by detector with static message, which provides deterministic responses. Verified with multiple identical requests - output is now consistent.
| } | ||
| ) | ||
|
|
||
| def _extract_detections_from_response( |
There was a problem hiding this comment.
there could be opportunities to potentially simplify _extract_detections_from_response since I am not sure if the API can return a flat array, but please check
There was a problem hiding this comment.
Simplified - removed flat array fallback since Detections API spec guarantees nested array structure (array of ContentsAnalysisResponse, which is itself an array).
|
|
||
| return response | ||
|
|
||
| def _calculate_highest_score(self, detections: List[Dict[str, Any]]) -> float: |
There was a problem hiding this comment.
is this really necessary or is it possible to use e.g. in-built max function instead of the custom one?
There was a problem hiding this comment.
Fixed - Removed _calculate_highest_score() and inlined with built-in max() function using default parameter.
| "average_score": average_score, | ||
| "individual_scores": individual_scores, | ||
| "highest_detection": highest_detection, | ||
| "detections": filtered_detections |
There was a problem hiding this comment.
L176 seems inconsistent with L149?
if metadata is needed, would it be better to have a consistent format where you always display all detectors and then just say pass / fail per detector?
There was a problem hiding this comment.
Fixed - standardized metadata structure to be consistent across both cases and added "passed" boolean flag to each detection. The inconsistency was from initially treating BELOW_THRESHOLD as a simpler case, but I agree that consistent structure makes it easier for the end user.
| result = await client.detect(text) | ||
| return result | ||
|
|
||
| except Exception as e: |
There was a problem hiding this comment.
is this dead code as detections_api.py already has try/except that always returns DetectorResult?
There was a problem hiding this comment.
Fixed - kept exception handler to catch constructor validation errors (e.g., missing detector_id in ConfigMap). This ensures misconfigured detectors are gracefully marked as unavailable rather than crashing the entire action, allowing other detectors to continue running.
| if isinstance(user_message, dict): | ||
| user_message = user_message.get("content", "") | ||
|
|
||
| detections_api_detectors = getattr( |
There was a problem hiding this comment.
a safer pattern or proper check might be worthwhile to implement
There was a problem hiding this comment.
Fixed - added explicit null-safety checks before accessing config.rails.config to prevent AttributeErrors if configuration chain is incomplete. Returns safe default (allowed: True) with warning log if any part of the config path is missing.
| f"{list(detections_api_detectors.keys())}" | ||
| ) | ||
|
|
||
| tasks_with_names = [ |
There was a problem hiding this comment.
Why store tuples then extract with task[1] and tasks_with_names[i][0]?
There was a problem hiding this comment.
Fixed - replaced tuple pattern with separate lists for detector names and tasks. Now using zip(detector_names, results) for explicit association instead of index access (task[1], tasks_with_names[i][0]).
| if not config: | ||
| return {"allowed": False, "reason": "No configuration"} | ||
|
|
||
| user_message = context.get("user_message", "") |
There was a problem hiding this comment.
does this mean I can only set this up as in input guardrail?
what if I would like to also configure this fr other message types?
There was a problem hiding this comment.
Fixed - added support for output guardrails and multiple message types. The actions now automatically detect and check the appropriate message from context:
user_message for input guardrails
bot_message for output guardrails
The implementation uses a priority-based approach that works seamlessly in both input and output flows without requiring user configuration.
There was a problem hiding this comment.
it seems to me that only input guardrails have been implemented; it would be good to also have this working as output guardrails and work on more than just user_message; perhaps check out how other providers handle this
There was a problem hiding this comment.
Fixed - added support for output guardrails and multiple message types. The actions now automatically detect and check the appropriate message from context:
user_message for input guardrails
bot_message for output guardrails
There was a problem hiding this comment.
is there a HTTP session leak in this file? please investigate
There was a problem hiding this comment.
Fixed - added cleanup_http_session() function to properly close the shared aiohttp session during application shutdown. The global session uses lazy initialization with asyncio lock for thread-safe creation and connection pooling across all detector clients.
m-misiura
left a comment
There was a problem hiding this comment.
Please go through all the comments; most importantly:
- colang flow in the user guide does not appear to be quite correct (there is stochasticity in the outputs received when sending a request with the same input text
- I think the current implementation only works on inputs, consider how this could be extended by look at other providers
- there are some redundancies and unnecessary code; consider removing any dead code
- metadata -- I am not sure if all fields are needed especially things like average score across detectors
- pre commit should be run on all files
- consider adding some unit tests
548e85a to
05b9487
Compare
| return None | ||
|
|
||
|
|
||
| class KServeDetectorConfig(BaseModel): |
There was a problem hiding this comment.
is KServeDetectorConfig used anywhere?
There was a problem hiding this comment.
Fixed - KServeDetectorConfig was added in my local branch as part of earlier KServe V1 API work and got merged into this PR since both configurations share the same config.py file.
To keep a clean scope for this PR focused solely on Detections API integration, I've removed KServeDetectorConfig and the kserve_detectors field from this PR. These will be added back in the upcoming KServe refactor PR alongside the full implementation
|
I think it might be a good idea to rename |
| config: DetectionsAPIConfig with endpoint, detector_id, threshold, etc. | ||
| """ | ||
| super().__init__(config, detector_name) | ||
| self.detector_id = getattr(config, "detector_id", "") |
There was a problem hiding this comment.
why not self.detector_id = config.detector_id instead of self.detector_id = getattr(config, "detector_id", "")?
There was a problem hiding this comment.
Fixed - I used used getattr() with defaults as defensive programming to handle multiple config types flexibly. However, you're absolutely right that since we're using Pydantic BaseModel configs with validated fields, this defensive approach is unnecessary—Pydantic already ensures these fields exist and have valid values at config creation time.
I've updated both BaseDetectorClient and DetectionsAPIClient to use direct attribute access:
config.inference_endpoint instead of getattr(config, "inference_endpoint", "")
config.timeout instead of getattr(config, "timeout", 30)
And so on for all config fields
I've kept config: Any in BaseDetectorClient to maintain extensibility (no need to modify base class when adding new detector types), while using the specific config: DetectionsAPIConfig type in DetectionsAPIClient for better type safety on detector-specific fields.
There was a problem hiding this comment.
The Detections API supports batching multiple texts, but this implementation
sends one text per request; is it worthwhile considering supporting this here?
If not, why not?
There was a problem hiding this comment.
The Detections API does support batching multiple texts in a single request. However, I chose single-text processing because NeMo's execution model provides one message at a time to detector actions.
Looking at NeMo's architecture and existing detector implementations, actions are invoked once per conversation turn with a single message from context (user_message or bot_message). This pattern is consistent across all built-in detectors—jailbreak_detection, hallucination, content_safety, and others.
NeMo's event-driven architecture creates one UtteranceUserActionFinished event per user utterance, and detector actions execute within that flow receiving a single message. The real-time conversation flow doesn't present a natural batching opportunity where multiple messages would be simultaneously available to process.
Adding batch support would require changing the action signature to handle lists, updating the result aggregation logic, and determining how to source multiple texts within NeMo's per-turn execution model. Given that only one text is available per detector invocation in the current flow, I don't see clear benefits that would justify this additional complexity. What are your thoughts?
| "detection_count": 0, | ||
| "total_detections": len(detections), | ||
| "individual_scores": [d.get("score", 0.0) for d in detections], | ||
| "highest_detection": max(detections, key=lambda d: d.get("score", 0.0), default={}), |
There was a problem hiding this comment.
is this going to work on Python 3.9 ? and if not, does it matter?
There was a problem hiding this comment.
Yes, this code is compatible with Python 3.9. However, NeMo Guardrails dropped Python 3.9 support ahead of its EOL in October 2025. The current supported versions per pyproject.toml are Python 3.10, 3.11, 3.12, and 3.13, so I think Python 3.9 compatibility isn't a concern for this codebase.
| """ | ||
| self.config = config | ||
| self.detector_name = detector_name | ||
| self.endpoint = getattr(config, "inference_endpoint", "") |
There was a problem hiding this comment.
self.endpoint = getattr(config, "inference_endpoint", "")
self.timeout = getattr(config, "timeout", 30)
self.api_key = getattr(config, "api_key", None)
if If using Pydantic configs, should these be direct attribute access?
There was a problem hiding this comment.
Fixed - Addressed in the previous comment
| ) as response: | ||
| http_status = response.status | ||
|
|
||
| if http_status == 200: |
There was a problem hiding this comment.
so all Non-200 responses all raise exceptions and become generic errors?
There was a problem hiding this comment.
You're right—the current implementation in base.py raises a generic exception for all non-200 responses, which prevented the specific error handling (404 NOT_FOUND, 422 VALIDATION_ERROR) I added to parse_response() from being reached.
I've updated _call_endpoint() in base.py to return the status code for all HTTP responses instead of raising exceptions. Now:
base.py handles only network/timeout errors (connection failures, timeouts)
parse_response() in subclasses handles HTTP status codes with appropriate differentiation
The specific error labels (NOT_FOUND, VALIDATION_ERROR) are now properly applied based on status code
This allows subclasses to implement status-code-specific error handling while keeping the base class generic.
| request_headers.update(headers) | ||
|
|
||
| # Add auth if configured (per-detector key or global env var) | ||
| token = self.api_key or os.getenv("DETECTIONS_API_KEY") |
There was a problem hiding this comment.
token = self.api_key or os.getenv("DETECTIONS_API_KEY") what would happen if I mounted a Secret as a volume? would authentication fail?
There was a problem hiding this comment.
Yes—the current implementation only checks self.api_key (from config) and the DETECTIONS_API_KEY environment variable. If a secret is mounted as a volume, authentication would fail since the code doesn't read from secret files.
I've updated the authentication logic to support file-based secrets. This now supports the standard OpenShift/Kubernetes pattern where secrets are mounted as files at a path specified by DETECTIONS_API_KEY_FILE, while maintaining backward compatibility with environment variables.
| if _http_session is None: | ||
| async with _session_lock: | ||
| if _http_session is None: | ||
| _http_session = aiohttp.ClientSession() |
There was a problem hiding this comment.
aiohttp.ClientSession() is currently created with no support for custom certs, is my understanding correct?
There was a problem hiding this comment.
Yes, aiohttp.ClientSession() is created with default SSL settings and doesn't support custom CA certificates.
I initially considered leaving SSL configuration to be handled at the infrastructure level (service mesh, system trust store) to keep the detector client focused on the API integration logic. However, after some thought, adding SSL support at the application level provides more deployment flexibility—it works regardless of how the infrastructure is configured and makes local development easier when testing against services with self-signed certificates.
I've added SSL configuration support—the session now accepts custom CA certificates via DETECTIONS_API_CA_CERT environment variable (for Kubernetes secret volumes) while maintaining backward compatibility with default system certificates.
| """Aggregated result from multiple detectors""" | ||
|
|
||
| allowed: bool = Field(description="Whether content passed all detectors") | ||
| reason: str = Field(description="Summary of detection results") |
There was a problem hiding this comment.
I am not sure if reason: and unavailable_detectors: are strictly needed, what do you think?
There was a problem hiding this comment.
Both fields are not strictly necessary but provide significant value for usability and maintainability:
reason field:
- Provides pre-formatted human-readable summaries (e.g., "
Blocked by 2 detectors: toxicity, jailbreak") - While this could be reconstructed from
blocking_detectorslist, it would require every caller to duplicate the formatting logic - Currently used for logging and makes future user-facing error messages trivial to implement
- Ensures consistent messaging format across all callers
unavailable_detectors field: (more useful than reason field)
- Separates infrastructure failures from content violations in a single field check
- Without it, callers would need to iterate through
blocking_detectorsand filter by error labels (e.g.,d.labelin ["ERROR", "TIMEOUT"]) - Used in Colang flows for fail-closed behavior when detectors are unavailable
- Makes the distinction between "detector down" vs "content blocked" explicit and immediate
While both could technically be derived from the existing detector result lists, having them as dedicated fields prevents code duplication and makes the common use cases (logging summaries, handling system errors) cleaner and less error-prone.
| if not hasattr(config, "rails") or not hasattr(config.rails, "config"): | ||
| log.warning("Configuration incomplete") | ||
| return AggregatedDetectorResult( | ||
| allowed=True, |
There was a problem hiding this comment.
Yes, Changed to allowed=False to maintain fail-closed behavior on configuration errors. This ensures broken configurations block content rather than allowing it through.
There was a problem hiding this comment.
I think there are inconsistencies in the allowing / blocking logic in this file; please take a look and streamline it.
At present, if I read it correctly, there are at least these options
- No config at all -> allowed=False (block)
- Config exists but incomplete -> allowed=True (allow)
- Detector not in config -> allowed=True (allow)
- No text -> allowed=True (allow)
Not sure if this fail-open behaviour is the way to go
There was a problem hiding this comment.
Fixed: I've streamlined the logic to consistently fail-closed:
Fixed behavior:
No config at all → allowed=False (unchanged)
Config incomplete → allowed=False (fixed - was True)
Detector not in config → allowed=False (fixed - was True)
No text → Removed early return (now flows through detectors)
Empty messages now go through the full detection pipeline as you suggested earlier. The API naturally returns "No detections found" for empty strings, resulting in allowed=True through the proper process rather than via shortcut.
All configuration/setup errors now consistently fail-closed for safety.
05b9487 to
f87cb9b
Compare
- Implement base interface pattern for extensible detector clients - Add DetectionsAPIClient for v1/text/contents protocol - Support configuration-driven detector management via ConfigMap - Add comprehensive documentation and deployment guide
Code quality improvements: - Standardize all imports to absolute paths for consistency - Remove dead code and simplify helper methods per API spec - Replace tuple pattern with explicit zip() for clarity Enhanced error handling: - Add HTTP status differentiation (404/422/500/503) - Add null-safety checks for incomplete config chains - Preserve constructor validation error handling Output guardrails support: - Add bot_message extraction for output rails - Implement priority-based message type detection - Support both input and output guardrail flows Bug fixes: - Fix inconsistent Colang responses with static messages - Add cleanup_http_session() to prevent session leaks - Standardize metadata structure with 'passed' flag Testing: - Add comprehensive unit tests
- Remove KServeDetectorConfig (moved to separate KServe refactor PR) - Replace getattr() with direct attribute access for Pydantic configs - Add file-based secret support (DETECTIONS_API_KEY_FILE for Kubernetes volumes) - Add custom SSL certificate support (DETECTIONS_API_CA_CERT for OpenShift) - Fix non-200 HTTP responses to allow subclass status differentiation - Standardize fail-closed behavior for all configuration errors - Remove commented empty message handling (now flows through detectors) - Update documentation: SSL config, file-based secrets, authentication priority All tests passing (109 tests, 97% coverage)
…lication - Add SYSTEM_ERROR_LABELS constant to base.py (single source of truth) - Add SERVER_ERROR handling for HTTP 5xx responses in detections_api.py - Add config incomplete checks before accessing rails.config in actions.py - Add detections_api_generate_block_message action for dynamic block messages - Fix env var names in tests (DETECTOR_API_* instead of DETECTIONS_API_*) - Update test assertion for SERVER_ERROR label on HTTP 500 All tests passing (109 tests)
bda557d to
3d09842
Compare
Description
This PR adds support for the Detections API v1/text/contents protocol, enabling NeMo Guardrails to communicate with external detector services that implement this standardized interface (e.g., TrustyAI guardrails-detectors, FMS Guardrails Orchestrator detectors).
Key Changes:
Base Interface Pattern: Introduced
BaseDetectorClientabstract class that eliminates code duplication when supporting multiple detector API protocols. Common logic (HTTP communication, session management, authentication, error handling) is shared, while API-specific logic (request/response formats) is isolated in subclass implementations.Detections API Client: Implemented
DetectionsAPIClientthat handles:{"contents": [text], "detector_params": {}}[[{detection1}, detection2}]]Configuration Support: Added
DetectionsAPIConfigtoRailsConfigDataenabling ConfigMap-driven detector management without code changes.Action Functions: Implemented
detections_api_check_all_detectors()anddetections_api_check_detector()for NeMo rails.co integration with parallel execution and proper error separation (system errors vs content violations).Comprehensive Documentation: Added deployment guide with Granite Guardian HAP example, testing instructions, and guide for adding new detectors.
Design Benefits:
build_request()andparse_response()methods onlyTesting Performed:
Related Issue(s)
Addresses the need for standardized detector API integration to support multiple detector service protocols (Detections API, KServe V1, future protocols) through a unified, extensible architecture.
Checklist